VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

نویسندگان

چکیده

Recently, computer vision foundation models such as CLIP and ALI-GN, have shown impressive generalization capabilities on various downstream tasks. But their abilities to deal with the long-tailed data still remain be proved. In this work, we present a novel framework based pre-trained visual-linguistic for recognition (LTR), termed VL-LTR, conduct empirical studies benefits of introducing text modality Compared existing approaches, proposed VL-LTR has following merits. (1) Our method can not only learn visual representation from images but also corresponding linguistic noisy class-level descriptions collected Internet; (2) effectively use learned improve performance, especially classes fewer image samples. We extensive experiments set new state-of-the-art performance widely-used LTR benchmarks. Notably, our achieves 77.2% overall accuracy ImageNet-LT, which significantly outperforms previous best by over 17 points, is close prevailing training full ImageNet. Code available at https://github.com/ChangyaoTian/VL-LTR .

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Machine learning based Visual Evoked Potential (VEP) Signals Recognition

Introduction: Visual evoked potentials contain certain diagnostic information which have proved to be of importance in the visual systems functional integrity. Due to substantial decrease of amplitude in extra macular stimulation in commonly used pattern VEPs, differentiating normal and abnormal signals can prove to be quite an obstacle. Due to developments of use of machine l...

متن کامل

Visual Object Recognition Through One-Class Learning

In this paper, several one-class classification methods are investigated in pixel space and PCA (Principal component Analysis) subspace having in mind the need of finding suitable learning and classification methods to support natural language grounding in the context of Human-Robot Interaction. Face and non-face classification is used as an example to demonstrate effectiveness of these one-cla...

متن کامل

The Effect of Visual Representation, Textual Representation, and Glossing on Second Language Vocabulary Learning

In this study, the researcher chose three different vocabulary techniques (Visual Representation, Textual Enhancement, and Glossing) and compared them with traditional method of teaching vocabulary. 80 advanced EFL Learners were assigned as four intact groups (three experimental and one control group) through using a proficiency test and a vocabulary test as a pre-test. In the visual group, stu...

متن کامل

Representation and Learning of Visual Information for Pose Recognition

Recovering position from sensor information is an important problem in mobile robotics, known as localisation. Localisation requires a map or some other description of the environment to provide the robot with a context to interpret sensor data. The mobile robot system under discussion is using an artificial neural representation of position. Building a geometrical map of the environment with a...

متن کامل

Visual Object Class Recognition

This dissertation implements, compares and evaluates different methods that can used to make inference about the existence of a specific object class in images. Initially, a visual vocabulary is created from the training data. Afterwards, the image content is expressed into an image descriptor using this visual vocabulary. Finally, different classification methods are used to make inference abo...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2022

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-031-19806-9_5